Submit a single zip-compressed file that has the name: YourLastName_Assignment_3 that has the following files:
Formatting Python Code When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code: http://pep8.org/ (Links to an external site.)Links to an external site. There is the Google style guide for Python at https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site. Comment often and in detail.
In his first state of the uniion address , president Trump mentioned Chicago violance 10 times Trump's State of the Union Address
Columnist Clarence Page wrote an article , published by the Chicago Tribune stated that the city of Chicago had more homicides in the past two years than New York and Los Angeles combined
The CSV file for crimes dataset for the city of Chicago is obtained from the data portal for the city of Chicago. Here is the link for the city of Chicago data portal City of Chicago Data Portal
Three datasets are need for this assignment:
Lets load the CSV file into a DataFrame object and see the nature of the data that we have.
Complete description of the dataset can be found on Chicago city data portal.
Based on Trumps State of the Uniion Address and the article written by columnist Clarence Page and published by the Chicago Tribune, we are interested to retrieve the data for the past two years and perform different types of spatial queries.
There are few of these queries that we are interested in to help CPD and city of Chicago to plot on a Choroplteh map those districts that have highest gun crimes.
Here are examples of those types of quereis:
Packages you need to Connect PostgreSQL server to load and retrieve Crhicago Crime dataset from the database:
Since we are using PostGIS in our work, please read and bookmark Chapter 4. Using PostGIS: Data Management and Queries
##The easiest way to install packages that are missing is to create a new cell before this one and add the following code
#import sys
#!{sys.executable} -m pip install folium
#!{sys.executable} -m pip install area
#!{sys.executable} -m pip install psycopg2
##This didn't work for a few students last quarter. This is another method:
#!conda install psycopg2 --y
import folium
from folium import plugins
from folium.plugins import MarkerCluster
import psycopg2
import csv
import pandas as pd
import json
from area import area
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
# open the file in mozilla instead of chrome
#jupyter notebook --browser firefox
# All data
db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes_ut", user="aso4098" , password="")
# ----------------------------------------------------------------
# unit test data - less data to use for testing purposes
#db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes_ut", user="YourNetID" , password="YourPassword")
# -----------------------------------------------------------------
cursor = db_connection.cursor()
The Crimes_2001_to_present.csv is downloaded from Chicago data portal and it has roughly 6.5 million records.
While working on this dataset, It is prudent to make a note of the following:
Sort algorithms used by the database engines vary in performance between O($N log N$) and O($ N^{2} $) where $N$ is the size of the number
Search algorithms used by the database engines vary in performance between O($log N$) and O($ N $) where $N$ is the size of the number
cursor.execute("SELECT district, count(district) FROM crimes GROUP BY district")
rows=cursor.fetchall()
crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_crimes'])
crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)
crimes_per_district.sort_values(by = "number_of_crimes", ascending = False)
total_number_of_crimes_per_district_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
total_number_of_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson",
fill_color='OrRd',
fill_opacity=0.5,
line_opacity=1,
# set up legend
data = crimes_per_district,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'number_of_crimes']
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("SELECT district, count(district) from crimes where district= %s \
GROUP BY district",[police_station[2]])
districts_crime_numbers = cursor.fetchall()
for district in districts_crime_numbers:
folium.Marker(location = police_station_location,popup=
folium.Popup(html="District No : %s has Total Number of Crimes:%s"
%district ,max_width=450)).add_to(total_number_of_crimes_per_district_map);
police_station[0], police_station[1], police_station[2]
total_number_of_crimes_per_district_map
Well, we really need only the violent crimes per district, so we will filter only those crimes that we are interested in. Please note that we are not interested to plot property crimes, we are really after violent crimes and in particular Gun related crimes.
So for now, lets plot violent crimes on Choropleth map and later on we will filter only Gun related crimes
violent_crime_categories='THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL ASSAULT','BATTERY','MURDER'
cursor.execute("SELECT district, count(district)\
FROM crimes \
WHERE PRIMARY_TYPE in %s \
GROUP BY district",[violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['district_num','number_of_violent_crimes'])
violent_crime_data['district_num'] = violent_crime_data['district_num'].astype(str)
violent_crime_data.sort_values(by = "number_of_violent_crimes", ascending = False)
violent_crimes_per_district_map= folium.Map(location =(41.8781, -87.6298),zoom_start=11)
violent_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# note to change of data source to set up legend
data = violent_crime_data,
key_on='feature.properties.dist_num',
columns = ['district_num', 'number_of_violent_crimes'],
legend_name="VOILENT CRIME MAP"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location =(police_station[0],police_station[1])
cursor.execute("SELECT PRIMARY_TYPE, count(PRIMARY_TYPE) \
FROM crimes where district =%s AND PRIMARY_TYPE in %s \
GROUP BY PRIMARY_TYPE",[police_station[2],violent_crime_categories])
data = cursor.fetchall()
violent_crimes_per_district_df = pd.DataFrame(data, columns=['Description', 'Number of Violent Crimes'])
header = violent_crimes_per_district_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location, popup=folium.Popup(html="District Number %s - Violent Crimes %s"
%(police_station[2],header))).add_to(violent_crimes_per_district_map);
violent_crimes_per_district_map
Lets first create a dataframe of gun crimes per district first to get an idea about the number of gun crimes per district
gun='%GUN%'
cursor.execute("SELECT district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df.sort_values(by = "gun_crimes", ascending = False)
districts_gun_violent_crimes_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
districts_gun_violent_crimes_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# setup legend - for color and values
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
Now, lets create a dataframe of the different types of gun crimes for every district and then plot it on Choropleth map
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DESCRIPTION, count(DESCRIPTION) \
FROM crimes \
WHERE district=%s and DESCRIPTION::text LIKE %s GROUP BY DESCRIPTION""",[police_station[2],gun])
district_gun_violent_crimes=cursor.fetchall()
# setup pop up
district_gun_violent_crimes_df=\
pd.DataFrame(district_gun_violent_crimes, columns=['Description', 'Number of Gun Crime'])
header = district_gun_violent_crimes_df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
folium.Marker(location=police_station_location,popup=folium.Popup(html="District No: %s GUN_Crime: %s"
%(police_station[2],header) )).add_to(districts_gun_violent_crimes_map)
districts_gun_violent_crimes_map
district=[]
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj=a[i]['geometry']
n= a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
# note merge with crimes_per_district from query #1
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/(final_data['district_area_inHectares']/100))
final_data.sort_values(by = "crime_density", ascending = False)
# area per district
af.head()
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# setup legend
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block,DESCRIPTION, count(arrest), arrest,latitude,longitude\
FROM crimes\
WHERE district=%s and DESCRIPTION::text LIKE %s \
GROUP BY caseno,block, DESCRIPTION,arrest, latitude, longitude""",[police_station[2],gun])
crimes_per_district = cursor.fetchall()
for crime in crimes_per_district:
if crime[4]==True:
folium.Marker(location=(crime[5],crime[6]),popup=
folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s"
%(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='green',
icon='ok-sign'),).add_to(marker_cluster)
else:
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html=
"District No: %s <br> Description: %s<br> Block: %s"
%(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='red',
icon='remove-sign'),).add_to(marker_cluster);
# arrest, latitude, longitude, block, description
crime[4], crime[5], crime[6], crime[1], crime[2]
gun_crime_arrests_map
districts_gun_violent_crimes_df\
.sort_values(by = "gun_crimes", ascending = False).head()
farthest_block_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_block_gun_crime_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# setup legend
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
gun='%GUN%'
for police_station in police_stations:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE ST_Distance(A.where_is,B.where_is) in
( SELECT max(dist) FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A,\
police_stations as B where A.district=%s
and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
farthest_block_gun_crime = cursor.fetchall()
cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(farthest_block_gun_crime[0][2],farthest_block_gun_crime[0][2]))
farthest_block_gun_crime_location = cursor.fetchall()
folium.Marker(location=(police_station[0],police_station[1]),popup=
folium.Popup(html="Police Station <br> District No.:%s <br> Farthest Gun_Crime Block:%s"%\
(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map);
folium.CircleMarker(farthest_block_gun_crime_location[0],radius=5,color='#ff3187',popup=
folium.Popup(html="District No.:%s <br> Block:%s"\
%(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map);
farthest_block_gun_crime[0], farthest_block_gun_crime[0][1]
farthest_block_gun_crime_map
from tabulate import tabulate
# behind the scenes of query 6
gun='%GUN%'
for police_station in police_stations:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE ST_Distance(A.where_is,B.where_is) in
( SELECT max(dist) FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A,\
police_stations as B where A.district=%s
and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
farthest_block_gun_crime = pd.DataFrame(data, columns = colnames)
print(tabulate(farthest_block_gun_crime, headers = 'keys', tablefmt = 'psql'))
print() # show the district and block info
cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(data[0][2],data[0][2]))
farthest_block_gun_crime_location = cursor.fetchall()
print(\
f"Pin placement: {farthest_block_gun_crime_location[0]}")
print()
Educational: more info about the assignment
# uncomment and run this code when your transactions get "stuck"
#db_connection.rollback()
# get the names of the tables in chicago_crimes_ut database
query = '''
SELECT table_name
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
AND table_schema = 'public'
'''
cursor.execute(query)
cursor.fetchall()
# Get the number of records in the police_stations
query = 'SELECT count(*) FROM police_stations'
cursor.execute(query)
cursor.fetchall()
# The location of the police station is repeated in various ways:
# x_coordinate, y_coordinate, location, latitude, longitude, and Where_IS (as a Point object)
# Data Dictionary for crimes data below
query = 'SELECT x_coordinate, y_coordinate, location, latitude, longitude,\
ST_X(ST_AsText(Where_IS)) AS X, ST_Y(ST_AsText(Where_IS)) AS Y, district\
FROM police_stations'
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
police_stations_locations = pd.DataFrame(data, columns = colnames)
police_stations_locations.head(2)
police_stations_locations.tail(2)
# Get the crimes table
query = 'SELECT * FROM crimes'
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
crimes = pd.DataFrame(data, columns = colnames)
##crimes.to_csv("C:\\Users\\asidd\\Desktop\\MSDS\\420 Database Systems\\Lecture 6\\crimes.csv")
crimes.columns
Data Dictionary (crimes): https://www.kaggle.com/currie32/crimes-in-chicago
ID - Unique identifier for the record.
Case Number - The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.
Date - Date when the incident occurred. this is sometimes a best estimate.
Block - The partially redacted address where the incident occurred, placing it on the same block as the actual address.
IUCR - The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.
Primary Type - The primary description of the IUCR code.
Description - The secondary description of the IUCR code, a subcategory of the primary description.
Location Description - Description of the location where the incident occurred.
Arrest - Indicates whether an arrest was made.
Domestic - Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
Beat - Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.
District - Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.
Ward - The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.
Community Area - Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.
FBI Code - Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). See the Chicago Police Department listing of these classifications at http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.
X Coordinate - The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
Y Coordinate - The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.
Year - Year the incident occurred.
Updated On - Date and time the record was last updated.
Latitude - The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
Longitude - The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
Location - The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.
# Get the first few records of the crimes table
crimes.head(2)
# Get the last few records of the crimes table
crimes.tail(2)
# what kinds of crimes were commited
crimes ['primary_type'].value_counts().head()
# what was the location of the crimes
crimes['location_description'].value_counts().head()
# look at the date range
cursor.execute('SELECT date_of_occurrence FROM crimes')
rows = cursor.fetchall()
# check out dates
crime_dates = pd.DataFrame(rows, columns = ['date_of_occurrence'])
cd = crime_dates.sort_values(by = 'date_of_occurrence', ascending = False)
cd.head(2)
# were arrests made
crimes['arrest'].value_counts()
# let's see arrests and primary_type of crimes
pd.crosstab(crimes['primary_type'], crimes['arrest'], margins = True)
# Get the police _stations table
cursor.execute('SELECT * FROM police_stations')
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
police = pd.DataFrame(data, columns = colnames)
#police.to_csv("C:\\Users\\asidd\\Desktop\\MSDS\\420 Database Systems\\Lecture 6\\police.csv")
police.columns
cursor.description
# look at sample data from police_table
police.head(2)
police.tail(2)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
police_stations_df = pd.DataFrame(police_stations, columns = colnames)
#police_stations_df.to_csv("C:\\Users\\asidd\\Desktop\\MSDS\\420 Database Systems\\Lecture 6\\police_stations.csv")
police_stations_df.head()
# Get the spatial ref table
query = 'SELECT * FROM spatial_ref_sys'
cursor.execute(query)
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
spatial = pd.DataFrame(data, columns = colnames)
spatial.columns
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
for police_station in police_stations:
police_station_location =(police_station[0],police_station[1])
cursor.execute("SELECT *\
FROM crimes where district =%s"
,[police_station[2]])
data = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
df_0 = pd.DataFrame(data, columns = colnames)
#df_0.to_csv("C:\\Users\\asidd\\Desktop\\MSDS\\420 Database Systems\\Lecture 6\\df_0.csv")
df_0.head()
spatial.head(2)
spatial.tail(2)
http://postgis.net/docs/manual-1.4/ch04.html
The SPATIAL_REF_SYS columns are as follows:
SRID An integer value that uniquely identifies the Spatial Referencing System (SRS) within the database.
AUTH_NAME The name of the standard or standards body that is being cited for this reference system. For example, "EPSG" would be a valid AUTH_NAME.
AUTH_SRID The ID of the Spatial Reference System as defined by the Authority cited in the AUTH_NAME. In the case of EPSG, this is where the EPSG projection code would go.
SRTEXT The Well-Known Text representation of the Spatial Reference System.
PROJ4TEXT PostGIS uses the Proj4 library to provide coordinate transformation capabilities. The PROJ4TEXT column contains the Proj4 coordinate definition string for a particular SRID.
The PDF document your are submitting must have the source code and the output for the following requirements
#long, lat, and district of police station
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
# Lets first create a dataframe of gun crimes per district and block
# first to get an idea about the number of gun crimes per block
gun ='%GUN%'
appended_data = []
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT district, block, count(district) AS gun_crimes\
FROM crimes\
WHERE district=%s and DESCRIPTION::text LIKE %s\
GROUP BY district, block\
HAVING count(district) =\
(SELECT count(district) AS gun_crimes\
FROM crimes\
WHERE district=%s and DESCRIPTION::text LIKE %s\
GROUP BY block\
ORDER BY gun_crimes DESC\
LIMIT 1)""",[police_station[2],gun,
police_station[2],gun])
data_1 = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
df_1 = pd.DataFrame(data_1, columns = colnames)
df_1['district'] = df_1['district'].astype(str)
df_1['block'] = df_1['block'].astype(str)
df_1['gun_crimes'] = df_1['gun_crimes'].astype(int)
appended_data.append(df_1)
appended_data = pd.concat(appended_data)
df_1 = appended_data
## df_1.info()
## Highest number of gun crimes by block per district
df_1.tail()
# If the df_1 does not have one record per district, then we need to make sure that
# we keep the highest block count for each district.
df_1map = df_1.loc[:, ['district','gun_crimes']].drop_duplicates()
df_1map
# checking if the gun crime counts are in integer
df_1map.info()
data_map= folium.Map(location =(41.8781, -87.6298),zoom_start=11)
data_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# legend
data = df_1map,
key_on='feature.properties.dist_num',
columns = ['district','gun_crimes'],
legend_name="Highest number of gun crimes by block per district"
)
# Now, lets create a dataframe of the different types of gun crimes for
# every block and then plot it on Choropleth map
#long, lat, and district of police station
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
gun='%GUN%'
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT district, block, count(district) AS gun_crimes\
FROM crimes\
WHERE district=%s and DESCRIPTION::text LIKE %s\
GROUP BY district, block\
HAVING count(district) =\
(SELECT count(district) AS gun_crimes\
FROM crimes\
WHERE district=%s and DESCRIPTION::text LIKE %s\
GROUP BY block\
ORDER BY gun_crimes DESC\
LIMIT 1)""",[police_station[2],gun,
police_station[2],gun])
data = cursor.fetchall()
# setup pop up
df=\
pd.DataFrame(data, columns=['District','Block','Number of Gun Crime'])
header = df.to_html(classes=
'table table-striped table-hover table-condensed table-responsive')
folium.Marker(location = police_station_location, popup = folium.Popup(html = "District No: %s GUN_Crime: %s"
%(police_station[2],header) )).add_to(data_map);
data_map
district=[]
tarea=[]
with open('Boundaries.geojson') as f:
data = json.load(f)
a = data['features']
for i in range(len(a)):
obj=a[i]['geometry']
n= a[i]['properties']
district.append(n['dist_num'])
tarea.append(area(obj)/10000)
af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
# note merge with crimes_per_district from query #1
#final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data= pd.merge(af, districts_gun_violent_crimes_df, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['gun_crimes']/(final_data['district_area_inHectares']/100))
final_data.sort_values(by = "crime_density", ascending = False)
# data exploration
cursor.execute("""SELECT * FROM crimes
WHERE DESCRIPTION::text LIKE '%UNLAWFUL POSS OF HANDGUN%'""")
x = cursor.fetchall()
colnames = [desc[0] for desc in cursor.description]
z = pd.DataFrame(x, columns = colnames)
# are we extracting right description
z.head(2)
# testing
z['district'].value_counts()
gun='%UNLAWFUL POSS OF HANDGUN%'
cursor.execute("SELECT district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df.sort_values(by = "gun_crimes", ascending = False)
farthest_unlawful_poss_handgun_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_unlawful_poss_handgun_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# setup legend
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()
gun='%UNLAWFUL POSS OF HANDGUN%'
for police_station in police_stations:
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE ST_Distance(A.where_is,B.where_is) in
(SELECT max(dist)
FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist
FROM crimes as A, police_stations as B
WHERE A.district=%s
AND DESCRIPTION::text LIKE %s
AND B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
farthest_unlawful_poss_handgun = cursor.fetchall()
if(len(farthest_unlawful_poss_handgun)>0):
##farthest_unlawful_poss_handgun[0][2]
cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))"
,(farthest_unlawful_poss_handgun[0][2],farthest_unlawful_poss_handgun[0][2]))
farthest_unlawful_poss_handgun_location = cursor.fetchall()
folium.Marker(location=(police_station[0],police_station[1]),popup=
folium.Popup(html="Police Station <br> District No.:%s <br> Farthest Gun_Crime Block:%s"%\
(farthest_unlawful_poss_handgun[0][0],farthest_unlawful_poss_handgun[0][1]))).add_to(farthest_unlawful_poss_handgun_map)
folium.CircleMarker(farthest_unlawful_poss_handgun_location[0],radius=5,color='#ff3187',popup=
folium.Popup(html="District No.:%s <br> Block:%s"\
%(farthest_unlawful_poss_handgun[0][0],farthest_unlawful_poss_handgun[0][1]))).add_to(farthest_unlawful_poss_handgun_map)
# It is possible for the query to return no records
len(farthest_unlawful_poss_handgun)>0
gun='%UNLAWFUL POSS OF HANDGUN%'
cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is)\
FROM crimes as A, police_stations as B
WHERE ST_Distance(A.where_is,B.where_is) in
(SELECT max(dist)
FROM
(SELECT ST_Distance(A.where_is,B.where_is) as dist
FROM crimes as A, police_stations as B
WHERE A.district=%s
AND DESCRIPTION::text LIKE %s
AND B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
farthest_unlawful_poss_handgun = cursor.fetchall()
if(len(farthest_unlawful_poss_handgun)>0):
farthest_unlawful_poss_handgun[0][2]
farthest_unlawful_poss_handgun_map
gun='%GUN%'
residence = 'RESIDENCE'
street= '%STREET%'
cursor.execute("SELECT location_description, district, count(district)\
FROM crimes\
WHERE DESCRIPTION::text LIKE %s\
AND (location_description::text LIKE %s\
OR location_description::text LIKE %s)\
GROUP BY district, location_description",[gun, residence, street])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['location_description','dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df.sort_values(by = "gun_crimes", ascending = False)
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson",
fill_color='YlOrRd',
fill_opacity=0.5,
line_opacity=1,
# setup legend
data = districts_gun_violent_crimes_df,
key_on='feature.properties.dist_num',
columns = ['dist_num', 'gun_crimes'],
legend_name="GUN CRIME"
)
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),
district from police_stations where district!='Headquarters'""")
gun='%GUN%'
police_stations = cursor.fetchall()
marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)
for police_station in police_stations:
police_station_location = (police_station[0],police_station[1])
cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block, location_description, count(DESCRIPTION), DESCRIPTION, latitude,longitude \
FROM crimes \
WHERE district=%s and DESCRIPTION::text LIKE %s
GROUP BY caseno, block, location_description, DESCRIPTION, latitude,longitude""",[police_station[2],gun])
crimes_per_district = cursor.fetchall()
for crime in crimes_per_district:
if crime[2] == 'RESIDENCE':
folium.Marker(location=(crime[5],crime[6]),popup=
folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s"
%(police_station[2], crime[1], crime[4])),icon=folium.Icon(color='green',
icon='ok-sign'),).add_to(marker_cluster)
elif crime[2]=='STREET':
folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html=
"District No: %s <br> Description: %s<br> Block: %s"
%(police_station[2], crime[1], crime[4])),icon=folium.Icon(color='red',
icon='remove-sign'),).add_to(marker_cluster)
# arrest, latitude, longitude, block, description
crime[0], crime[1], crime[2], crime[3], crime[4], crime[5], crime[6]
gun_crime_arrests_map